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A recent genome-wide association study of educational attainment identified three significant single 
nucleotide polymorphisms (SNPs) (rs9320913, rsll584700, and rs4851266). In this study, we expanded 
this previous work by investigating behavioral correlates of these SNPs in a Han Chinese sample 
(rs9320913 was not available in our data and was thus replaced by rsl2202969, which is in high linkage 
disequilibrium [i.e., correlations of alleles] with the former, r 2 = 0.96 in Han Chinese population based on 
the 1000 Genomes Project). Association analysis for individual SNPs showed significant associations 
between rs4851266 and a measure of language ability (Chinese word recognition), and between 
rsl2202969 and a personality trait (fear of negative evaluation) and a measure of mathematical ability 
(number paired-associates learning). A polygenic score based on these three SNPs was also significantly 
associated with the measures of mathematical and language abilities. Specifically, educationally advan- 
taged alleles identified in the previous study were associated with less fear of negative evaluation and 
higher mathematical and language abilities in the current study. This exploratory study provides 
evidence of psychological mechanisms for the association between genes and educational attainment. 

© 2014 Elsevier Ltd. All rights reserved. 



1. Introduction 

Educational attainment usually refers to the highest degree of 
education that an individual has completed. Twin studies showed 
that educational attainment had moderate heritability, ranging 
from 18% to 77% across different studies (Branigan, McCallum, & 
Freese, 2013). 

A recent genome-wide association study, using a large Cauca- 
sian sample, showed that three independent single nucleotide 
polymorphisms (SNPs, rs9320913, rsll584700, and rs4851266) 
were associated with educational attainment (Rietveld et al., 
2013). They found positive associations between years of schooling 
and allele A of rs9320913 (located on Chromosome 6 at position 
98,691,454 bp near the gene LOCI 00129158), and positive associa- 
tions between college completion and allele A of rsll 584700 
(located on Chromosome 1 at position 202,843,606 bp near the 
gene LRRN2 [leucine rich repeat neuronal 2]) and allele T of 
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rs4851266 (located on Chromosome 2 at position 100,184,911 bp 
near the gene LOC150577). These researchers suggested that fol- 
low-up studies should use the candidate gene approach to explore 
the potential associations between these genetic variants and well- 
measured endophenotypes, such as personality and cognition 
(Flint & Munafo, 2013; Rietveld et al., 2013). 

Indeed, Ward et al. (2014) recently reported a significant asso- 
ciation between the composite score of the three educational 
attainment-related SNPs (i.e., rs9320913, rsll 584700, and 
rs4851266) and children's school performance as measured by 
the Standard Assessment Tests (SATS) at age 13-14 years. More 
specifically, rs9320913 was found to be associated with both the 
English (p = 0.002) and mathematics scores (p = 0.015), but there 
were no significant associations between the other two SNPs (i.e., 
rsll584700 and rs4851266) and the test scores (p's > 0.05). 

The current study aimed to expand on the work by Ward et al. 
(2014) by examining possible associations between genetic vari- 
ants linked to educational attainments and various endopheno- 
types, including an extensive set of measures of personality, 
mental health, cognition, and mathematical and language abilities 
(see Methods for a brief description and see Chen et al., 2013 
Table S2 for details). Using data from an existing project (e.g., 
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Chen et al., 2013), we analyzed the associations between these 
behavioral measures and three SNPs (i.e., rsl 2202969, 
rsll584700, and rs4851266). SNP rsl2202969 was used instead 
of rs9320913 because the latter is not available in our data and 
these two SNPs are 8.5 kb apart and have extremely high linkage 
disequilibrium (LD), r 2 = .98 and .96 in European and Han Chinese 
populations, respectively, based on the data of the 1000 Genomes 
Project (http://browser.1000genomes.org). Allele A of rs9320913 
corresponds to allele A of rsl 2202969. Results should help eluci- 
date the underlying mechanisms for the genetic basis of educa- 
tional attainment. 

2. Material and methods 

2.1. Participants 

Of the original sample of 480 subjects (Chen et al., 2013), 
genetic data for the three SNPs were available for 342, who were 
the subjects of the current study. All subjects were healthy Han 
Chinese undergraduates (sophomores, mean age = 20.42 years, 
SD = .89, range 18-22 years old; 55% female) from Beijing Normal 
University in China. All subjects had normal or corrected-to-nor- 
mal vision and had no neurological or psychiatric history based 
on their self-report. They all signed written informed consent. This 
study was approved by the Institutional Review Board (IRB) of Bei- 
jing Normal University, China. 

Individuals in this sample were all unrelated to one another. To 
ensure that there was no stratification effect in our sample, we 
conducted the following analyses using PLINK software (Purcell 
et al., 2007). First, we used PLINK to calculate the genomic inflation 
factor (lambda values) based on the genome-wide data of our sub- 
jects. In the current study, lambda values were all near 1 indicating 
no stratification effect on the association results. Second, we 
assessed the stratification effects by PLINK via clustering individu- 
als into homogenous subsets based the genome-wide average pro- 
portion of alleles shared identity by state (IBS) between any two 
individuals. At the first level, we constrained the cluster solution 
to two classes, and the results showed that only one subject 
belonged to the second cluster, suggesting little systematic strati- 
fication. Deleting one subject made little difference to the results, 
so data for all 342 subjects were analyzed. 

2.2. Behavioral assessment 

To explore the potential behavioral correlates of educational 
attainment-related SNPs, we analyzed the data from a battery of 
behavioral measures including 17 personality and mental health 
self-reported measures, 18 cognitive tasks, 9 mathematical tasks, 
and 5 language tasks (see Chen et al. (2013) Table S2, for details). 
These measures have been widely used in previous research and 
proved to have good psychometric properties. Preliminary analy- 
ses resulted in three measures with significant (p < 0.01 ) associa- 
tions with the targeted SNPs (see Table SI for complete results 
on all measures). Therefore, we describe the three measures in 
greater detail below. 

2.2. J. Personality: Fear of negative evaluation 

Fear of negative evaluation was measured by a personality 
questionnaire called Brief Fear of Negative Evaluation (BFNE). It 
includes 12 items measuring the degree to which people experi- 
ence apprehension at the prospect of being evaluated negatively 
(Leary, 1983). For example, a sample item is "I often worry that I 
will say or do the wrong things." Each item is answered using a 
five-point Likert scale, ranging from 1 (not at all characteristic of 
me) to 5 (extremely characteristic of me). Previous research has 



shown that the scale has satisfactory reliability and construct 
validity (Leary, 1983). Similar to previous studies, the Cronbach 
alpha value was 0.90 in the current study. 

2.2.2. Mathematical ability: Number paired-associates learning 
This test was based on a study by Delazer et al. (2005) and a pre- 
vious study for measuring mathematical ability among Chinese 
college students (Wei, Yuan, Chen, & Zhou, 2012). An artificial 
operation '§' was defined as '12b -9a + 70'. That is, 
a § b = 12b - 9a + 70. For example, 5 § 3 = 61 (because 
5 § 1 = 12 x 3-9 x 5 + 70). Two other examples of the equations 
were 2 § 1 = 64, 4 § 2 = 58. Fifteen equations were created based 
on the artificial operation. The participants, however, were not 
given the above definition of the operation. Instead, they were 
briefly presented an expanded form of the operation (i.e., 
'a§b' = (10 - a) x b - (b - a) x (b-a) - (a - 3) x (b - l) + a x a + 
b x b + (100-10 x a - b) - 27'). Because of its complexity, the par- 
ticipants were not able to memorize the definition of the artificial 
operation. Instead, they were asked to memorize the associations 
between pairs of operands and their answer for the 1 5 equations. 
During the learning stage, an equation was presented in the middle 
of the screen for 10 s. After subjects memorized all equations, they 
were tested. During the test stage, participants needed to judge 
whether a given equation (e.g., '5 § 3 = 61', '5 § 3 = 64') was correct 
or not. Half of the trials in the test stage were correct equations, 
and the other half were incorrect equations. Each trial was pre- 
sented for 9 s. After the test stage, participants learned the equa- 
tions again and were tested again. The percentage of correct 
answers on the second test was analyzed. The split-half reliability 
of this test was 0.69. 

2.2.3. Language ability: Chinese word recognition 

Participants were asked to read aloud very low-frequency 
Chinese characters (50 characters in total). These characters 
were selected from a Chinese character psycholinguistic norm 
(Liu, Shu, & Li, 2007). This task was used in a previous study 
that measured Chinese reading ability (Mei et al., 2013). The 
subjects were told that they would see some Chinese characters 
and have 5 s to read each character aloud before they were 
prompted to move on to the next character. The experimenter 
pointed to each Chinese character one by one from left to right, 
asked the subject to read each character aloud within 5 s, and 
recorded the performance of the subject on the answer sheet. 
The number of characters recognized correctly was used as the 
index for this task. 

2.3. Cenotyping 

A 4 ml venous blood sample was collected from each subject. 
Genomic DNA was extracted according to the standard method 
within 2 weeks after the blood sample was collected. All samples 
were genotyped using the standard Affymetrix genotyping proto- 
col (Affymetrix, Inc.). As described in Table 1, the allele frequencies 
of genotyped SNPs (rsll584700, rs4851266, and rsl2202969) in 
our sample were very similar to those of the Asians (e.g., Chinese 
and Japanese) in the HapMap dataset (www.hapmap.org [phase 
3]). All SNPs met the criteria of a call rate of >95%, Minor Allele Fre- 
quency (MAF) of >0.05, and Hardy-Weinberg equilibrium (HWE) of 
p > 0.05 in the current study. 

2.4. Polygenic score 

Following the procedure used by Rietveld et al. (2013), we 
created a polygenic score based on the selected SNPs for each 
individual. The polygenic score for the ith individual was calcu- 
lated as 
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Table 1 

Allele frequencies of candidate SNPs shown by ethnic groups. Data were from the present study and the HapMap dataset (www.hapmap.org [phase 3]). 



SNP 


Chr 


Position 
(bp) 


Nearest gene 


Ref. 
allele 


Reference allele frequency 








Chinese Current 
study 


Chinese CHB 
HapMap 


Japanese JPT 
HapMap 


Europeans CEU 
HapMap 


African ASW 
HapMap 


rsl 2202969 


6 


98,682,944 


LOC100129158 


A 


0.38 


0.42 


0.33 


0.49 


0.22 


rs4851266 


2 


100,184,911 


LOCI 5050577 


T 


0.59 


0.54 


0.51 


0.41 


0.08 


rs 11 584700 


1 


202,843,606 


LRRN2 


A 


0.68 


0.68 


0.61 


0.79 


0.92 



Note: Chr: chromosome; bp: base pair; Ref allele: reference allele. Population descriptors: CHB: Han Chinese in Beijing, China; JPT, Japanese in Tokyo, Japan; CEU: Utah 
residents with Northern and Western European ancestry from the Centre d'Etude du Polymorphisme Humain (CEPH) collection; ASW: African ancestry in Southwest USA. 



Table 2 

Significant associations between SNPs and behavioral phenotypes after controlling for age and sex (p < 0.01 ). 

SNP Ref allele Freq (Personality) (Mathematics) (Language) 

Fear of negative evaluation Number paired-associates learning Chinese word recognition 









(M±SD 


= 42.64 ±9.48) 




(M±SD = 


0.73 ±0.11) 




(M±SD = 


31.58 ± 


7.77) 


f 


T 


P 


P 


T 


P 


|8 


T 


P 


rsl 2202969 


A 


0.38 


-0.15 


-2.69 


0.0075 


0.19 


3.23 


0.0014 


0.02 


0.26 


0.7928 


rs4851266 


T 


0.59 


0.10 


1.86 


0.0640 


0.03 


0.58 


0.5636 


0.21 


3.59 


0.0004 


rsll 584700 


A 


0.68 


-0.01 


-0.10 


0.9233 


-0.07 


-1.21 


0.2273 


0.00 


0.02 


0.9810 



Note: Ref allele: references allele; Freq: frequency in the current data; significant associations are underlined and in bold. 



Si = $>«kj 

x t j is the number of copies of the effect allele for SNP j, and bj is 
the estimated SNP effect from the multiple-SNP analysis. Following 
their method, we created a polygenic score for each individual 
based on three SNPs (i.e., rsl2202969, rsll584700, and 
rs4851266). Specifically, the polygenic score for each subject = (the 
number of copies of the A allele for rsl2202969) * (0.101) + (the 
number of copies of the A allele for rsl 1584700) * (-0.095) + (the 
number of copies of the T allele for rs4851266) * (0.082). The b, 
estimates were based on the genetic association of EduYears data 
as reported in Table 1 and Table S9 of Rietveld et al. (2013). 

2.5. Data analyses 

Quantitative trait genetic association analysis was carried out 
by using Plink vl.07 (Purcell et al., 2007), including allelic associa- 
tion tests between individual SNPs and behavioral measures after 
controlling for sex and age. Because this was an exploratory study, 
we set the significance threshold at p < 0.01 without further cor- 
rection for multiple comparisons. Multiple regression analysis 
was used to examine the associations between a polygenic score 
based on the three SNPs and behavioral measures after controlling 
for sex and age. 

3. Results 

Table 2 shows the significant (p<0.01) associations between 
three candidate SNPs (i.e., rsll584700, rs4851266, and 
rsl 2202969) and behavioral phenotypes after controlling for sex 
and age. Allele A of rsl2202969 was associated with lower scores 
of fear of negative evaluation (/? = -0.15, t=-2.69, p = 0.0075, 
Cohen's d= -0.31) and higher scores of number paired-associates 
learning (p = 0.19, t = 3.23, p = 0.0014, Cohen's d = 0.39). Allele T of 
rs4851266 was associated with higher scores of Chinese word rec- 
ognition {fi = 0.21, t = 3.59, p = 0.0004, Cohen's d = 0.44). However, 
there was no association between rsl 1584700 and the three behav- 
ioral indices (p > 0.05). In order to see whether these associations 
were robust, we conducted additional analyses that included addi- 
tional covariates. Results showed that the associations between 
rsl 2202969 and number paired-associates learning, between 



rsl2202969 and fear of negative evaluation, and between 
rs4851266 and Chinese word recognition remained significant 
(p < 0.05) after controlling for age, sex, intelligence [measured with 
Raven's Advanced Progressive Matrices (Raven, Raven, & Court, 
1998)], and parental education attainment [the higher of the two 
parents' educational attainment, (Santelli, Lowry, Brener, & Robin, 
2000)]. 

Similar to the findings using individual SNPs, after controlling 
for age and sex, the polygenic score based on these three SNPs 
was associated with higher scores of number paired-associates 
learning (/? = 0.18, t = 2.93, p = 0.0037, Cohen's d = 0.35) and higher 
scores of Chinese word recognition (/? = 0.13, f = 1.98, p = 0.0490, 
Cohen's d = 0.25). However, the association between the polygenic 
score and fear of negative evaluation was not significant 
{p = -0.03, t = -0.51, p = 0.6125, Cohen's d = -0.06). 

We also conducted the post hoc power analysis. In the current 
study, SNP rsl 2202969 accounted for 2.2% of the variance of fear 
of negative evaluation; SNP rsl 2202969 accounted for 3.8% of the 
variance of number paired-associates learning; SNP rs4851266 
accounted for 5.3% of the variance of Chinese word recognition. 
Using the software Quanto (version 1.2.4) (i.e., a power calculator 
for genetic association studies developed by Drs. Gauderman and 
Morrison, http://biostats.usc.edu/Quanto.html), we calculated post 
hoc the number of subjects required when the desired power was 
set at 80% at a significance level of 0.05 (2-sided), and the power 
we had given our sample size. For SNP rsl 2202969 and fear of neg- 
ative evaluation, a sample size of 353 would yield 80% power; but 
with our 311 subjects, the power was 75%. For SNP rsl2202969 
and number paired-associates learning, a sample size of 203 would 
yield 80% power; with our 277 subjects, the power was 91%. For 
SNP rs4851266 and Chinese word recognition, a sample size of 144 
would yield 80% power; with our 265 subjects, the power was 96%. 
Therefore, the statistical power was adequate for exploring the asso- 
ciations between the individual SNPs and three phenotypes above. 

Finally, the polygenic score accounted for 3.5% of the variance of 
number paired-associates learning, 2.4% of the variance of Chinese 
word recognition, and 0.2% of the variance of fear of negative eval- 
uation. For the polygenic score and number paired-associates 
learning, a sample size of 213 would yield 80% power; but with 
our 277 subjects, the power was 89%. For the polygenic score 
and Chinese word recognition, a sample size of 313 would yield 
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80% power; but with our 265 subjects, the power was 73%. For the 
small association between the polygenic score and fear of negative 
evaluation, it would require a sample size of 3822 to yield 80% 
power, so our 311 subjects was underpowered at 12%. In sum, 
our sample size was adequate to detect modest effects but not 
small effects. 



4. Discussion 

In the present study, we explored behavioral mechanisms that 
might have explained the role of three SNPs in educational attain- 
ment as found in a previous GWAS (Rietveld et al., 2013). Results 
showed significant associations of rsl 2202969 with a personality 
trait (fear of negative evaluation) and a measure of mathematical 
ability (number paired-associates learning), and of rs4851266 with 
a measure of language ability (Chinese word recognition). Educa- 
tionally advantaged alleles identified in the previous study were 
associated with less fear of negative evaluation, better number 
paired-associates learning, and better Chinese word recognition 
in the current study. Furthermore, the polygenic score based on 
the three SNPs was associated with the mathematical and language 
scores. These results suggested that these specific skills might be 
the behavioral mechanisms underlying the link between the three 
SNPs and educational attainment. In the following paragraphs, we 
discuss these behavioral mechanisms in the context of relevant 
research literature. 

First, fear of negative evaluation, as measured by the Brief Fear 
of Negative Evaluation (BFNE) scale, reflects people's worries about 
failing in the eyes of other people (Leary, 1983). Consequently, 
individuals scoring high on this scale would have fear of failure, 
characterized by habitual feelings of worry, unpleasant tension, 
and lack of confidence about future performance. Individuals with 
such fears would have difficulty functioning during an important 
test. Indeed a recent study showed that subjects with greater fear 
of failure had lower educational attainment (Kuyper, Van der Werf, 
& Lubbers, 2000). Supporting the role of genes in such personality 
traits, a twin study showed that the personality trait of fear of neg- 
ative evaluation had a heritability of about 42% (Stein, Jang, & 
Livesley, 2002). Taken together the results of our study and those 
of Rietveld et al. (2013), it appears that individuals with allele A 
of rsl 2202969 were more likely to have lower fear of negative 
evaluation and consequently higher levels of eventual educational 
attainment. 

Second, mathematical skills are certainly critical for educational 
attainment (Kaufman, Kaufman, Liu, & Johnson, 2009). We found 
that number paired-associates learning was a significant mecha- 
nism. Because our sample was college students, it made sense that 
basic number cognition might not have played an important role. 
Indeed, a recent study that investigated the correlates of college stu- 
dents' performance on a test of advanced mathematics showed that 
number paired-associates learning was positively correlated with 
advanced mathematics even after controlling for general cognitive 
processing, basic numerical processing, spatial processing, and lan- 
guage processing(Wei et al., 201 2). Our finding that individuals with 
allele A of rsl 2202969 were better at number paired-associates 
learning than other genotypes was consistent with the results found 
by Ward et al. (2014). They could explain the finding of Rietveld 
et al. (2013) about this allele's role in educational attainment. 

Third, language ability is also an important predictor of educa- 
tional attainment (Deary, Strand, Smith, & Fernandes, 2007). Inter- 
estingly, a recent twin study of a Chinese sample suggested that 
Chinese word recognition (i.e., the ability to read Chinese words 
aloud correctly) had the highest heritability (73%) as compared to 
many other Chinese language abilities (Chow, Ho, Wong, Waye, & 
Bishop, 2011). We found that subjects with allele T of rs4851266 



were more likely than those with other genotypes to score higher 
on the Chinese word recognition test. Therefore, this allele may have 
resulted in an advantage in native language learning and hence in 
educational attainment as found by Rietveld et al. (2013). 

Finally, taken together the contributions of all three SNPs, the 
results seemed consistent across all three studies (ours as well as 
those of Rietveld et al., 2013, and Ward et al., 2014): The polygenic 
score of the three SNPs was associated with academic performance 
and hence educational attainment. To further support this aca- 
demic skills-specific explanation of the links between the three 
SNPs and educational attainment as proposed by Ward et al. 
(2014), we had null findings for almost all other general measures 
of personality and cognitive abilities. 

Several limitations of this study need to be noted. First, the 
physiological functions of the identified SNPs linked to educational 
attainment were not directly explored in the present study or two 
previous studies (Rietveld et al., 2013; Ward et al., 2014). Addi- 
tional molecular functional studies are needed to investigate the 
biochemical mechanisms underlying this association. Second, it 
should be noted that the current study was based on a sample of 
healthy Han Chinese college students. The use of a normal healthy 
college sample to investigate the gene-behavioral associations has 
both strengths and weaknesses. With a homogeneous sample (in 
terms of age range, ethnicity, physical and mental health status, 
cognitive abilities, etc.) like Chinese college students, our results 
were less likely than those from heterogeneous samples to be con- 
founded by group differences (or population stratification). More- 
over, with the limited variance in our dependent variables such 
as cognitive abilities, the significant effects we found probably rep- 
resented conservative estimates of the true effects. Nevertheless, 
given the particular nature of our sample, our results may or 
may not be generalizable to other populations. Future research 
should replicate our findings with various samples such as com- 
munity samples of different ethnicities. Third, this was an explor- 
atory study to identify potential behavioral mechanisms 
underlying the previously documented gene-behavior associations, 
so we used an arbitrary and lenient threshold of p < 0.01 without 
using more stringent corrections for multiple comparisons. These 
results are thus preliminary and their value is to provide some 
bases for future hypothesis-testing research. Finally, we used the 
candidate gene approach to examine the associations between 
three educational attainment-related SNPs and behavioral pheno- 
types in the current study. However, both education attainment 
and its endophenotypes are influenced by multiple genetic variants 
(Bae et al., 2013; Docherty et al., 2010). For example, a previous 
study showed that the personality trait of novel seeking is a medi- 
ator in the relationship between the dopamine D4 receptor (DRD4) 
gene and predisposition to higher education (Keltikangas-Jarvinen, 
Elovainio, Kivimaki, Ekelund, & Peltonen, 2002). Future studies 
should examine other genetic variants that contribute to both edu- 
cational attainment and its endophenotypes. 

In conclusion, this study provides evidence for behavioral 
mechanisms involved in the association between genes and educa- 
tional attainment. Different SNPs seemed to be mediated by differ- 
ent behaviors such as personality traits and mathematical and 
language abilities in their effects on educational attainment. 
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